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Abstract 

In this paper we investigate the rate of convergence of the so-called two-armed 
bandit algorithm in a financial context of asset allocation. The behaviour of the 
algorithm turns out to be highly non-standard: no CLT whatever the time scale, 
possible existence of two rate regimes. 
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Introduction 

In a recent joint work with P. Tarres (see we studied the convergence of the so- 
called two-armed bandit algorithm. In the terminology of learning theory (see e.g. [SI 
llOj ) this algorithm is a Linear Reward Inaction {LRI) scheme. Viewed as a Markovian 
Stochastic Approximation {SA) recursive procedure, it appears as the simplest example 
of an algorithm having two possible limits - its target and a trap - both noiseless. In 
SA theory a target is a stable equilibrium of the Ordinary Differential Equation [ODE) 
associated to the mean function of the algorithm, a trap being an unstable one. Various 
results from SA theory show that an algorithm never "falls" into a noisy trap (see e.g. [SJ 
[T3l [21 ^5 . We established in |H] that the two-armed bandit algorithm can be either 
infallible [i.e. converging to its target with probability one, starting from any initial value 
except the trap itself) or fallible. This depends on the speed at which the (deterministic) 
learning rate parameter goes to 0. 

Our aim on this paper is to investigate the rate of convergence of the algorithm, 
toward either of its limits. In fact, the algorithm behaves in a highly non standard way 
among SA procedures. In particular, this rate is never ruled by a Central Limit Theorem 
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(CLT). Furthermore, this study will provide some new insight on the infallibility problem 
as it will be seen further on. However our motivations are not only theoretical but also 
practical in connection with the financial context in which the algorithm was presented 
in [51 , namely a procedure for the optimal allocation of a fund between the two traders who 
manage it. Imagine that the owner of a fund can share his wealth between two traders, 
say A and B, and that, every day, he can evaluate the results of one of the traders and, 
subsequently, modify the percentage of the fund managed by both traders. Denote by Xn 
the percentage managed by trader A at time n. We assume that the owner selects the 
trader to be evaluated at random, in such a way that the probability that A is evaluated 
at time n is Xn , in order to select preferably the trader in charge of the greater part of the 
fund. In the LRI scheme, if the evaluated trader performs well, its share is increased by a 
fraction 7„ E (0, 1) of the share of the other trader, and nothing happens if the evaluated 
trader performs badly. Therefore, the dynamics of the sequence {Xn)n>o can be modelled 
as follows: 

Xn+l = Xn + 7n+l (l{C/„+i<X„}nA„+i (1 " ^n) " I{C/„+i>X„}nB„+i-'^n) , = X G [0, 1], 

where {Un)n>i is an i.i.d. sequence of uniform random variables on the interval [0, 1], An 
(resp. Bn) is the event "trader A (resp. trader B) performs well at time n". We assume 
P(j4„) = p^, P(-Bn) = Pb^ for ^ ^ 1) with p^, Pg E (0,1), and independence between 
these events and the sequence {Un)n>i- The point is that the owner of the fund does not 
know the parameters p^, Pg. Note that this procedure is [0, l]-valued and that and 1 
are absorbing states. The jn parameter is the learning rate of the procedure (we will say 
from now on reward to take into account the modelling context). 

This recursive learning procedure has been designed in order to assign progressively 
the whole fund to the best trader when p^ Vb- I^om now on we will assume without 
loss of generality that > Pb- This means that Xn is expected to converge toward its 
target 1 with probability 1 provided Xq E (0, 1) (and consequently never to get trapped 
in 0). However this "infallibility" property needs some very stringent assumption on the 

reward parameter 7„: thus, if 7„ = (^jj^^ , n > 1, with < a < 1 and C > 0, it is shown 
in [HI (see Corollary l{h)) that the algorithm is infallible if and only if a = 1 and C < 

In a standard SA framework, when an algorithm is converging to its target - i.e. a 
zero X* of its mean function h{x) = \ Xn-x) ^ ^^qTqYq £qj. ^j^g ODE x = h{x) — 

its rate is ruled by a CLT at a ^/7^-rate with an asymptotic variance cr^* related to the 
asymptotic excitation of x* by the noise (see [Tl IHl IT^). 

As concerns the two-armed bandit algorithm, there is no exciting noise at 1 (nor at 
indeed). This is made impossible simply because both equilibrium points lie at the 
boundary of the state space [0, 1] of the algorithm (otherwise the algorithm would leave 
the unit interval when getting too close to its boundary). This same feature which causes 
the fallibility of the algorithm when 7^ goes to too slowly also induces its non-standard 
rate of convergence. 

To illustrate this behaviour and consider again the steps 7^ = n>l, with C > 0. 
As a consequence of our main results, one obtains: 
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• If C > — the algorithm is falhble with positive probabihty from any [0, 1) and, 

when faihng, it goes to at a n~*^^s-rate. The rate of convergence to 1 may vary 
according to the parameters, see Section [l] 

• If p < C* < ^ (this case requires that < p^), the algorithm is infallible 
from any (0, 1] and goes to 1 at a n~*^^-4-rate. 

• If — < C < — - — then the algorithm is infallible (from any x £ (0, ll) and two 

Pa Pa-Pb ^ J \ ^ iJ 

rates of convergence to 1 may occur with positive ¥x -probability: a "slow" one - 
u-^^Pa-Pb) - and a "fast" one - u'^Pa. 

• If C < — then the algorithm is still infallible from any xG (0, 1] but only the slowest 

Pa 

rate of convergence "survives" i.e. n~'^^PA~PB\ 

In fact the following rule holds true: the greater the real constant C is, the faster the 
algorithm (Xn) converges, except that when C is too great, then the algorithm becomes 
fallible which makes the two-armed bandit a very "moral" procedure. Furthermore, note 
that the "blind" choice - C = 1 - which ensures infallibility induces a slow rate of conver- 
gence n~'~"^PA~PB^ since then C < j- (by contrast with the fast rate n"'^^^). Also note 

that this rate is precisely that of the mean algorithm Xn+i = Xn + ^n{PA ~ PB)^n{^ — Xn). 
A last feature to be noticed is that the switching between rate regimes takes place "pro- 
gressively" as the parameter C grows since it happens that two different rates coexist with 
positive probability. 

For more exhaustive results, we refer to Section^ If one thinks again of a practical 
implementation of the algorithm, the only reasonable choice for the reward parameter is 
7n = : it ensures infallibility regardless of the (unknown) values of p^ and Pg . But 
when these two parameters become too close, the rate of convergence becomes too poor 
to remain really efficient. Unfortunately, this is more or less the standard situations: the 
daily performances of the traders are usually close and this can be extended to other fields 
where this procedure can be used (experimental psychology, clinical trials, industrial reli- 
ability, . . . ). One clue to get rid of this dependency is to introduce a "fading" penalization 
in the procedure when an evaluated trader has unsatisfactory performances. (By fading 
we mean negligible with respect to the reward in order to preserve traders' motivation). 
This variant of the two-armed bandit algorithm which satisfies a pseudo-CLT at a (weak) 
n~2-rate whatever the parameter and Pg is described and investigated in 0. 

The paper is organized as follows: Section ^ is devoted to some preliminary results 
and technical tools. Section [21 is devoted to the rate of convergence when the algorithm 
converges to its trap whereas Section IHl deals with the rate of convergence toward its 
target 1. Section ^ proposes a summing up of the results for a natural parameterized 
family of reward parameter 7„. 

Notations: • Let {an)n>o and {bn)n>o be two sequences of positive real numbers. The 

symbol a„ ~ 6„ means an = b^ + o{bn). 

• The notation P^^. is used in reference to Xq = x. 
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1 Preliminary results 

We first recall the definition of the algorithm. We are interested in the asymptotic behavior 
of the sequence (^n)nGN) where Xq = x, with x E (0, 1) and 

Xn+l = Xn + 7n+l (l{(7„+i<X„}nA„+i (1 " ^n) " I{C/„+i>X„}nB„+i -'^n) , n G N. 

Here (7n)n>i is a sequence of nonnegative numbers satisfying 

n 

7n < 1 and r„ = ^ 7^ — > +cxd as n — > 00, 

A:=l 

(f^n)n>i is a sequence of independent random variables which are uniformly distributed 
on the interval [0, 1], the events An, Bn satisfy 

where < < < 1, and the sequences {Un)n>i and lB„)n>i are independent. 

The natural filtration of the sequence {Un, I_B„)ra>i is denoted by {J^n)n>o and we set 

■^=Pa-Pb > 0- 
With this notation, we have, for n > 0, 

Xn+l = Xn+ 7n+ivrX„(l - X„) + 7„+iAMn+l, (1) 

where AM„+i = Mn+i — Mn, and the sequence (M„)„>o is the martingale defined by 
Mo = and 

AM„+i = I{c/„+i<x„}nA„+i(l - Xn) - l{u„+i>x„}nB„+i^n - vrX„(l - 

One derives from that is a [0, l]-valued super-martingale. Hence it converges a.s. 
and in to a limit Consequently 

^ 7n^n(l - -'^n) < +00 a.S. 

n 

which in turn shows that X^ = or 1 with probability 1. One easily checks (see ^) that 
1 is a stable equilibrium of the so-called mean ODE = x = 7rx(l — x) with attracting 
basin (0, 1] and is a repulsive equilibrium of this ODE (whence the terminology: 1 is a 
target and is a trap, see jS] for more details). 

fiThe conditional variance process of the martingale (M„) will play a crucial role in 
our analysis, and we will often use the following estimates. 

Proposition 1 We have, for n > 0, 

p^Xnil -Xn)<E (AM2+i I J^n) < P^X„(1 - Xn). 
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Proof: We have 

where the last inequahty follows from ^ Pa- -^^^ ^'^^ lower bound, note that 

p^il-Xn)+P^Xn-7r^Xnil-Xn) = (1 - X„) (p^ - VT^X^) + X„ 

> {l-Xn){pA-'^)+PB^n=PB^ 

where we have used 7rX„ < 1. ^ 

2 Convergence to the trap 

We first prove that, under rather general conditions, as soon as the sequence converges to 
the trapping state 0, it goes to it very fast in the sense that the series J2n -^n is convergent. 

Proposition 2 // 

1 1 

lim inf > -vr (2) 

" 7n+l 7n 

then 

VxE(0,l), {X^ = 0} = {Y,Xn<+oo} r^-a.s. 

n 

Note that is satisfied if the sequence (7n)n>i is nonincreasing (for large enough n). 

Proof of Proposition HJ Denote by E the event {Xoo = 0} n {J^n^n = +oo}. We 
want to prove that ¥x{E) = 0. We first show that on E, 

lim inf >0. (3) 



We deduce from (P) that 



7n+l 7n+l 



- + 7rX„(l-X„) + AM„+i 



— +Xn - — + 7r(l - X„) ) + AMn+l. 

In V7n+1 7n 



By summing up and setting 70 =71, we derive 



7n 71 ^1 V7A: 7fc-i 
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From Proposition^ we know that the conditional variance process of (M„) satisfies 

n n 
fc=l k=l 

Therefore, on E, we have < M >oo= +00 a-S-, and using the law of large numbers for 
martingales, we deduce that 

lim — — = 0, a.s. on E. 

The estimate ^ then follows easily from the assumption 

Now let Sn = J2k=i -^k- Note that, on E, Sn ~ J2k=i ^k-i, so that, using (0), 

3O0, Vn>l, 7n<C^. 

This implies 

where we have used X„ < 1. We also know from Proposition 9 of [Oj (see (29) in particular) 
that, on the set {Xn — > 0}, 

lim sup -^^ ^^"2 — < +00 a.s. 

Hence X„ < CJ2k>n^k+i ^'^^ some C > 0, and, by plugging in the estimate 7a:+i < 
CXf^^i/Sk+i we derive 



02 

k>n 



< c(supXfc+i)^ 

^ ^SUPfc>„Xfc+l 



q2 

'-'fe+1 



Sn 

On the set E, we have lim Sn = +00, so, for n large enough, say n> N, we have 

n— >oo 

SUPfc>„Xfc+i 
X„< =^ . 

Now, by taking n to be the largest integer such that Xn > Xf^f (which exists on {Xn 0} 
because X]\f > 0), we reach a contradiction, which proves that W^iE) =0. <^ 

Our next result shows that under (jSJ, there is essentially only one way for (Xn) to go 
to 0. 
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Proposition 3 Assume 

(a) Letx£ (0,1). Then 

n 

Px(^oo = 0)>0 ^ F{J2l[(^-iB,lk) <+oo)>0 (4) 

n>lfc=l 

and, on the event {X^o = 0}, there exists a (random) integer hq > 1 such that 

n 

Vn>no, Xn = Xno Yi (l-lBfe7fc)- a.s. (5) 

fc=no+l 

A^ote i/iai, as a special case of 

n 

E 11(1 - p^-fk) < +00 =^ F^iXoo = 0) > 0. (6) 

n>l k=l 

(b) Furthermore, if E In < 0) reads 

n>l 

n 

Px(Xoo = 0) > ^ E 11(1 - p^Jk) < +00 

n>l k=l 

and moreover there is a random variable H^; > such that 

n 

Xn E^Y[{1 - Pg^k) a.s. on {X^o = 0}. 

k=l 

Remark 1 If E 7n — +c>o, a weaker (but still tractable) sufficient condition for ¥x{XaD = 

n>l 

0) is given by 

(2) " 

3pe{0,p^{l-p^)/2), E^"'"^" Y[{i - p^jk) < +^ 

n>l k=l 

where Tn^ = J2i<k<n^k (^^^ proof of Proposition . Then, on the set {Xn — > 0}, for 
every rje {0,Pg{l - Pg)/2), 

Remark 2 Note that the condition in Q) which characterizes fallibility does not depend 
on x: if the algorithm is fallible for one x£ (0, 1) then it is for any such x. 
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Proof of Proposition El (a) It follows from Proposition [21 and the conditional Borel- 
Cantelli Lemma that ¥r-a.s. 



{X^ ^ 0} = l{c/„+i<x„} < +00} = U n {Uk+i > Xk} . (7) 

n>0 n>Ok>n 

The sequence of events \ r\k>n {Uk+i > Xk}] being non-decreasing, we have 

V — / n>l 



P,(X„ ^ 0) = Jim I fl {Uk+i > Xk} 

. k>n 



and the left-hand side is positive if and only if, for some integer n > 1, 

P. I fl {C/fc+i >Xk}]> 0. 

\k>n / 

From the definition of the sequence (Xn), we get (with the convention Ho = 1)) 

fl {Uk+i >Xk} = n I Uk+i > Xk and Xk = Xn - Ib.h) \ (8) 

k>n k>n [ l=n+l J 

= n \ uk+i>Xn\{ (l-lB,7^)i. (9) 

k>n y l=n+l J 

Note that © follows from and (jSJ. Now, denote by i3„ the o"- field generated by the 
random variable X„ and the events, Bk, k > n. We have 

( n I Uk+l >XnY[ (1 - lB,7e) I I I = n ( 1 - X„ n (1 - ^B.^e)] , 
\k>n { e=n+l J / k=n \ l=n+l J 

and the infinite product is positive if and only if 

k 

5^ [| (1 - lB,-fe) < +00. 

k l=n+l 

This clearly implies The sufficient condition Q follows from the equality 

e(e n (i-iB.7fc)) = E n i^-Psik). 

\n>ll<k<n / n>ll<k<n 

(b) (and proof of the remark) If ^ 7^ < -|-oo, then, a straightforward argument (see [£], 



n>l 



proof of Lemma 2) shows that 



k=l 



This proves claim (6). 

When ^ 7^ = +00, one checks that 

ra>l 

log ft C-f^]=M^-±ilpA^-p,)+e,)jl 
where Ek is random variable bounded by c^k (c real constant) and 

n 
k=l 

is a martingale with bounded increments satisfying < >n~ Psi^ ~ Pb)'^^?'^ ~^ +oo. 
Then 

= o(r(f)) 

since ^^gj^ > as n ^ 00. Consequently, P-a.s., there exists a finite random variable ^ 

such that 

11(1 - l^^ jk) < eexp -( -pj +o(l))r(,2) J](i 



k=l ^ ' k=l 

where o(l) denotes a random variable P-a.s. going to as n ^ cx). The sufficient condition 
given in the remark follows straightforwardly as well as the rate of convergence of X„. ^ 

3 Convergence to the target 

In order to study the rate of convergence to 1, we first rewrite as follows: 

1 - Xn+i = (1 - X„) (1 - 7„+ivrX„) - 7„+iAM„+i. (10) 

Now let 

n 

en=l[il-7k7TXk-l), Yn = il-Xn)/On, UGN. 
k=l 

Proposition 4 (a) The sequence {Yn)ne'N is a non-negative martingale. 
fi{h) On the set {Xqo = 1}, we have 

hm ^ = 0^00 

nfc=i(i-^7fc) 

almost surely, where ^ is a finite positive random variable and = lim„^oo^- 
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Proof: The first assertion follows from the equality 

fn+l 

and the fact that the sequence {6n)neN is predictable. 

As a non-negative martingale, the sequence (yn)ngN has a limit Y^q, which satisfies 
"Too > a.s. and E(yoo) < +00. 

Recall that X]n7n^n-i(l ~ ^n-i) < +00 almost surely. Therefore, on {X^o = 1}, we 
have X]n7n(l ~ -^n-i) < +00 a.s., which implies that the sequence YVk^i ^ I'l'l^^'^ has a 
positive and finite limit and the second assertion of the Proposition follows easily. <^ 

Remark 3 Note that, with the notation r„ = X]fc=i Iki we have nfe=i(l ~ ''^Ik) ^ e~'"^". 
Therefore, we deduce from Proposition 0] that, on the set {^^oo = 1}) 1 ~ = 0{e~'^^") 
almost surely. If we have J2n^n < +00, the sequence ^e'^^" nfc=i(l ~ '''"7A:)) converges to 
a positive limit, so that, on the set \Xoo = 1|, we have lim e^^"(l — X„) = f'Yoo, with 

n^oo 

^' E (0, +00) almost surely. 

On the other hand, on {X^o = 0}, the sequence {On)n£N itself converges to an almost 
surely positive limit, so that {Yoo = 0} C {Xqo = 1}. 

Proposition 5 (a) IfJ^nln^^^" < +00, the martingale (l^)neN is bounded in LF' and its 
limit satisfies E(Xoo5^oo) > 0. Moreover, on the set {Y^o = 0}, we have 

Yn 

1™ sup ^.2" .^r.^, < (11) 



fc+i 



almost surely. 

m) If 

^llfi^^" = +^ and sup7ne'''"" < +00, (12) 
n ">i 

then, for every x£ (0, 1), 

{Xoo = 1} = {Y^ = 0} ¥.,-a.s. 

Remark 4 It follows from Proposition [HI and Remark El that, if X^nTn^'^^" < +00, on the 
set {Xoo = l}n{yoo > 0} (which has positive probability) the sequence ((1 — X„)e'^^")„gf^ 
converges to a positive limit almost surely. 

Remark 5 We also derive from the inequality (l-X„+i) > (1-X„) (^1 - ^n+i'^{u„+i<x„}nA„ 
that 

n 

1 - > (1 - x) n (1 - IkUj > Ce-PA^\ 

k=l 

for some real constant C > 0, if J2nln < +00. Therefore, we deduce from Proposition 1^1 
that if lim^ ( e^B^" ^ 7|+ie'''^'=+i = 0, then P(yoo = 0) = 0. On the other hand, the 

\ k>n J 

second part of Proposition|Slshows that, in some cases, we may have 1—Xn = o(e~'^'""), and 
we need to investigate what the real rate of convergence is in such cases: see Proposition[7| 
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Proof of Proposition [SJ (a) Assume I]„7^e'^^" < +00. In order to prove L2- 
boundedness, we estimate the conditional variance process. Using Proposition ^ we have 

2 

2 

< 2^p^X„(l-X„) 

2 

= 7ll±lr) X Y 

< „ ^!Lhl Y 

— y A n 1^ \1 

- ^■*(l-vr7„+i)2nLi(l-vr7;.) 

< Cp^Tn+ie'^^-^^i^n, (13) 

where we have used the inequality 0„ > 11^=1 (1 ~ ■^7fc) and the fact that, since we have 
E„>i7n < +00, nfc=i(l - ^7fc) > e-'''^"/C for some C > 0. Note that supEy„ < +00. 

neN 

Therefore, the convergence of the series Ylm In^'^^" imphes that {Yn)n&^ is bounded in L2. 
In order to prove E(Xooyoo) > 0, we consider the conditional covariance 

¥.:,{{l-Xn)Xn\Tn-l) = (1 - ) (l + VT 7„ (1 - 2X„_i ) + ^ T^X^.i - ^^7^ 

> Xn~l{l- Xn-l)[l- T^'~1nXn-l- PaII 



SO that Ea; (X„y„ I > x„_iy„_i ( 1 

V 1 - vr7„A, 



PA7n 



n—l 



( P T*^ 



1 - 7r7„ y 

2 

For n large enough (say n > no), we have 1 > i^J^ and, by induction, for n > no, 

n / 9 \ 

PATk \ 



E.(x„y„) >E,x„„y„„ n i-f 

fe=no+l ^ 



7r7fc 



Now, using that y„ ^ y^ and X„ ^ in L^(P), and J2nln < +00, one finally gets 
E^(X^y^) > 0. Note that this implies that F^iX^ = l, y^ > 0) > since X^ = 

The first step to establish ((TT|) is to apply to the martingale {Yn)n>i an approach 
originally developed in jS] to establish the infallibility property for (X„): for every n > 1, 

^ n 
1 



" k>n+l 
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Plugging (|13() in the above inequality and using that Kx{Yk \ J-n) = Yn for every k > n 
yield, 

" k>n+l 

On the other hand the martingale P^(y^ = 0\J^n) converges Wx-a.s. toward 1^y^=o}- 
The announced result follows easily. 

6(5) We now assume J^nln^^^" — +^ ^^'^ sup7„e'^'"" < +00. Note that the latter 

n 

hand, we have 



condition implies 7^ < (77^6 " for some C > 0, so that J2nln < +00. On the other 



\Y —Y 1 1 — —I AM I 

< -|AM„| <C7„e-r"|AM„|, 

nfc=i(l -7r7fc) 

so that the martingale (l^)n>i has bounded increments. Consequently the Law of Iterated 
Logarithm (cf. j4j) implies that liminf 1^ = —00 on the event {<Y >^= +00}, and, since 

Yn > 0, we deduce thereof that {< Y>^< +00} almost surely. On the other hand, we 
have, using Proposition ^ and the inequality On < e~'^^", 

"n 

> TIT^ .p^Xn-lYn-l 

> CXn-lYn-lJ^^". 

Therefore, the assumption ((T^ implies that Y^o = on the event {X^o = 1}- 

In order to clarify what happens when Y^o = 0, we first observe that we have, up to 
null events, 



|5^(1-X„) <+oo| = |^I{t7„>x„} <+oo| 



c u n 1 - = (1 - n (1 - ^A,ik) 

m>l n>m k=m+l 

SO that, on the set {J2ni^ ~ -^n) < +00}, we have 

n 

l-Xn^Cl[il-lA,lk) a.s., 

k=l 

where ^ is a positive random variable. Recall that, if X]n7n < +00, 11^=1(1 ~ ^A^lk) ~ 
^'e~'PA^"^ for some (random) ^' > 0. We thus see that, on the set {X)n(l ~ -^n) < +00}, 
we have a "fast" rate of convergence. The possibility of occurrence of this fast rate is 
characterized in the following Proposition. 
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Proposition 6 We have, for all x £ (0, 1), 

n 

P,(5](l - X„) < +oo) > ^ F{J2l[{l-lA,7k)<+^)>0. 

n n>lk=l 

n n 

Note that the condition ^ JJ^ (1 — PaT^) < implies IT ^"'^ ~ '^A^.lk) < +oo) = 1 

n>lfc=l n>lfc=l 

and that if J2n In < we have 

P^.(^(l - Xn) < +oo) > ^ ^ e-fArn < +oo. 

n n>l 

The proof of Proposition |21 and of these comments is similar to that of the analogous 
statements concerning convergence to 0. 

In the following Proposition, we give a sufficient condition for the fast rate to be 
achieved with probability one and a sufficient condition under which we have at most two 
rates with positive probability: e~'"^" and the fast rate e~^A^". 

Proposition 7 Let e„ = — ^ — vr for n > 1. 

fi{(^) UJlnln^n < +00, we haveJ2ni^~-^n) < +00 almost surely on the set{Xoo = 1}. 
(6) //liminfe„ > 0, then X^nTn^'^'"" < +oo, and, on the event {Y^^ = 0}, we have 
J2ni^ ~ -^n) < +00 almost surely. 

Note that the condition J^nln^n < implies liminfe^ = and is satisfied in the 
following cases: 

• the sequence (7„) is constant, 

• 7n = An^" (for large enough n), with A a positive constant and < a < 1, 

• 7n = C/iC + n), where the constant C satisfies vrC > 1. 

On the other hand, if 7„ = C/((7 + n), with vrC < 1, we have liminf e„ > 0. 

n 

Before proving Proposition Q we state and prove a lemma which will be useful for the 
proof of the second statement. 

Lemma 1 Assume that, for some positive integer uq, Vn > uq, e„ > 0. Then, the 
sequence {Zn)n>no, with Zn = (1 — X„)/7„ is a submartingale, and we have J2ni^~-^n) < 
+00 a.s., on the set {X^o = 1} n {sup„ < +cxd}. 

Remark 6 If inf 7„e'^^" > 0, we have (on the event {X^o = 1}) 1 - AC„ < Ce"'^^" and 

(1 — Xn)/"fn+i ^ Ce~'^^"+^ /jn+1- Then one can slightly relax the assumption in claim 
(b) since it follows from Lemma^that if > for n large enough, J2ni^ ~ -^n) < +oo 
almost surely on {X^o = !}• 
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Proof of Lemma [2 Starting from (fTU)) . we have 



7n+l 7„+i 



7rXn{l - Xn) - AMn+l 



= (1 - X„) (^-1 + e„ + vr - TvXn^ - AM„+i 

= ^^^(l + en7n + vr7„(l-X„))-AM„+i, (14) 

SO that, for n > no, Zn+i > Zn — AM„+i, which proves that {Zn)n>no is a submartingale. 
Now set := min{n > uq : 1 — X„ > Ljn+i}, L > 0. Then the stopped submartingale 
n>no satisfies 

(AZ^^i)^ < 1| >„+i|(AZ„+i)^ < L + sup||AM,||oo. 

n 

Consequently the sub-martingale {Zn^)n>no bounded with bounded increments. Hence 
it converges (Px-a.s. and in L'^{¥x)) toward an integrable random variable (^^. Further- 
more (see jll| ) the conditional variance increment process of its martingale part also 
converges to a finite random variable as n — > +oo. This reads 

J2 E((AM„)2|Jc;_i) < +00 F^-a.s.. 

n=no+l 

But, we know from Proposition ^ that 

E((AM„)2 I > p^Xn-l{l - Xn-l). 

Consequently, 

{Xoo = 1} n (UpeN{rp = +oo}) C 1 - X„ < +00} . 

n 

We conclude by observing that Upg^lr^ = +00} = {sup^ ^j+i ^ 

Proof of Proposition d We first assume that J2n In^t < The proof is based, 

as in Lemma n on the study of the sequence ((1 — Xn)/jn)- We deduce from ((TH) that 

1-X^ < i^(l + 4^„ + vr7n(l-X„))-AM„+i. (15) 

7n+l 7n 

Hence 

E fi^^ I < (1 + etln + vr7n(l - X„)) . (16) 

V 7n+l / 7n 

We know from Proposition 0] that, on the set {X^d = 1}, we have sup(l — Xn)e^^" < +00, 

n 

SO that 7„(1 - Xn) < C7„e"''^" < C for some C > 0, and I]n7n(l - -'^n) < +00. We 
now deduce from (|16|) and a supermartingale argument that, on {X^o = 1}, the sequence 
((1 — Xn)/^n)neN is almost surely convergent. 
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On the other hand, with the notation Zn = {1 — Xn)/jn, we know from H15|) that 

AM„+i <Zn- Zn+1 + Zn (e+7n + T^ln{l - ^n)) ■ 

Therefore, on {Xqo = 1} the martingale Mn is bounded from above, and, since it has 
bounded jumps, we must have <M>oo< +00 almost surely. We know from Proposition^ 
iha.i <M>oo>PBlln^n-i{'^- Xn^i). Hence X;n(l - -^n) < +00 a.s. on {Xqo = 1}. 

We now assume that liminf e„ > 0, so that for n large enough (say n > no), we have 

vr > e, (17) 



In+l In 

for some e > 0. In particular the sequence (7n)n>no is non-increasing and, for n > uq, 

7„ - 7„+i > (vr + e)7„7„+i, 
which implies X]n7n < +00. We also have, for n > no, 

7n+i < 7n(l - (^ + e)7n+i) < e-^^+'^^"+K 
Therefore, for k > n > uq, 

Ik < 7ne-("+^)(^'=-^"), 
and I] 716^^' < ^ 7fc7ne-(''+^)('''=-^"^e^^'= 

k>n k>n 

= 7ne(-+^)"" E 7fce-^"^ 

k>n 
roo 

< 7ne("+"^^" / e-'^'dx 

Jt„-i 

< 7n— e-^". 

e 

We have thus proved not only that J2n^n^^^" < +00, but also that 

k>n 

for some C > 0. It then follows from Proposition [S] that, on the set {Y^o = 0}, (1 — X^) < 
COn'Jne-^^'^ , and, using Remark |3J we get sup(l — Xn)/"fn < +00 a.s. on {Yao = 0}. We 

n 

complete the proof by applying Lemma ^ ^ 

Remark 7 Assume, with the notation of Proposition[71 that liminf > and J2n e"^-*^" < 
+00. This is the case if 7^ = C/(n + C), with irC < 1 < p^C. Then, we deduce from 
Propositions Q and IHl that < ^(^00 = 0) < 1 and that, on {Yoo = 0} the sequence 
(1 — Xn)e^A^" converges to a positive limit, whereas on {1^00 > 0}, (1 — X„)e'^'"" converges 
to a positive limit almost surely. 
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4 A parametric guide to the rates 



In this section we will call fast a rate of the algorithm which induces that the error series 
converges i.e. J2n ^ ~ -^n < +00 when X„ 1 and J2n -^n < +00 when X„ 0. Other 
rates will be considered as slow. 

BAssume (at least for large enough n) that 

Then, the algorithm behaves as follows: 

• If (a G (0, 1)) or (a = 1 & Cpg > 1) then the algorithm is fallible with positive 
probability from any x£ [0, 1) (note that this probability is lower than 1 if x G (0, 1)). 
When failing, it always goes to at a fast rate, (n~'^^s if a = 1). This follows from 
Proposition [21 

• If a = 1 and C < the algorithm is infallible from any x G (0,1]. This follows 
from Proposition 1316). 

As concerns rates one has 

• If a = 1 and C > ^ then the - fast - rate of convergence is n^'^^A on {X„ 1}. 
This follows from Proposition [7fa). 

• If a = 1 and — < C < - then exactly two rates of convergence occur with positive 

Pa ^ 

Wx-prohahility on {X„ 1}: a slow one - n^'-''^ - and a fast one - n~'~''^A. This 
follows from Proposition El and Ef6) (see remark 7). 

• If a = 1 and C < — then (the algorithm is infallible from any x G (0, 1]) but only 

Pa 

the slow rate of convergence survives i.e. n^^'^ on {X„ — > 1}. This follows from 
Proposition |H1 

Note as corollaries that, 

B- when 2pg < p^ (then ^ < ^): it is possible to choose C & ^] so that the 

algorithm is simultaneously infallible and converging with a fast rate. This is possible 

-Ea. 

because in some sense p^ and p^ are remote enough. The fastest achievable rate is n 
(with C = ^). Of course such a specification is purely theoretical since p^ and Pg are 
supposed to be unknown. 

B- when p^ < p^ < 2pg (then ^ < ^): there is no access to fast converging rates 
within infallibility, because and Pg are too close to each other . 

B- in any case, when no information is available on the parameters and Pg, the 

"blind" choice C = 1 < — which ensures infallibility induces a slow rate of convergence. 

Pa 

namely n~'" . In fact this rate can be very poor when and Pg get close to each other. 

At this point the conclusion can be the following: the higher the parameter C is, the 
faster the algorithm goes. But if C is too high, it may go wrong. 



16 



- One further point to be noticed is that what we cahed the slow rate - e~'^^" - for the 
algorithm is but the rate of its mean deterministic version (see [HI for details). So, even 
when it is infallible (that is converges to the same limit as its mean version), it always 
converges at least as fast as this deterministic procedure (which is of no practical interest 
since its implementation would require and Pg to be known). When no information is 
available on the parameters and Pg , this is the rate which is actually obtained. 

As a conclusion, the convergence rate behaviour of this stochastic approximation algo- 
rithm is completely non-standard. Thus, from a mathematical viewpoint, one last feature 
to be noticed is the unusual "spectrum" of the rates since the switching from one rate to 
another takes place "progressively" with a range of values of the parameter C for the gain 
parameter for which two different rates are achieved with positive probability. 
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